BIJUNG:13.6.1 비정상성(Non-stationarity) 문제: 하위 정책이 변할 때 상위 정책이 겪는 학습 불안정성 해결 (Hindsight Replay 활용 등)